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It is known since the early days of molecular biology that proteins locate their specific targets on 
DNA up to two orders of magnitude faster than the Smoluchowski 3D diffusion rate. It was the idea 
due to Delbriick that they are non-specifically adsorbed on DNA, and sliding along DNA provides 
for the faster ID search. Surprisingly, the role of DNA conformation was never considered in this 
context. In this article, we explicitly address the relative role of 3D diffusion and ID sliding along 
coiled or globular DNA and the possibility of correlated re-adsorbtion of desorbed proteins. We have 
identified a wealth of new different scaling regimes. We also found the maximal possible acceleration 
of the reaction due to sliding, we found that the maximum on the rate-versus-ionic strength curve 
is asymmetric, and that sliding can lead not only to acceleration, but in some regimes to dramatic 
deceleration of the reaction. 



I. INTRODUCTION 
A. The problem 

Imagine that while you are reading these lines a A- 
phage injects its DNA into a cell. For the infected cell, 
this sets a race against time: its hope to survive depends 
entirely on the ability of the proper restriction enzyme 
to find and recognize the specific site on viral DNA and 
then cut it, thus rendering viral DNA inoperable and 
harmless. If restriction enzyme takes too long to locate 
its target, then the cell is dead. 

This is, of course, just an example. Essentially all 
of molecular biology is about various enzymes operat- 
ing with the specific places on DNA, and each enzyme 
must locate its target site quickly and reliably. How can 
they accomplish the task? It was recognized very early 
on that the search by free diffusion through the 3D so- 
lution is far too slow and proteins somehow do it faster. 
Indeed, the rate at which diffusing particles find the tar- 
get was determined by M. Smoluchowski as early as in 
1917 [1], it is equal to AirD^bc, where b is the target ra- 
dius, -D3 and c are, respectively, the diffusion coefficient 
and concentration of diffusing particles, in our case - pro- 
teins (see also appendix A for a simple derivation). Al- 
though Smoluchowski result sets the rigid upper bound 
for the possible diffusion controlled rate, proteins at least 
in some instances somehow manage to do it up to about 
two orders of magnitude faster - see, for instance, [2, 3]. 
The idea to resolve this paradox goes back to Delbriick [4] 
who suggested that proteins can fairly quickly adsorb on 
a non-specific random place on DNA and then ID sliding 
along DNA can be much faster than the 3D diffusion. In 
fact, the idea that reduced dimension speeds up chemical 
reaction can be traced even further back to Langmuir [5] , 
who noticed that adsorbtion of reagents on a 2D surface 
can facilitate their diffusive finding each other. 

The field attracted intensive attention for many years. 
Early studies [2, 3] seemed to corroborate the Delbriick 
model. A nice recent review of various strategies em- 
ployed to address the problem experimentally can be 



found in the paper Ref. [6]. Based on the summary of 
experimental evidence, authors of this review conclude, 
that the process is not just the naive ID sliding, but 
rather a delicately weighted mixture of ID sliding over 
some distances and 3D diffusion. A theorist also could 
have guessed the presence of a cross-over between ID slid- 
ing and 3D diffusion, because sliding along coiled DNA 
becomes very inefficient at large scale: having moved by 
about t 1 / 2 along DNA after ID diffusion over some time t, 
protein moves in space by only i 1 / 4 if DNA is a Gaussian 
coil. This is very slow subdiffusion. That is the situation 
requiring theoretical attention to understand how 3D and 
ID diffusion can be combined and how their combination 
should be manifested in experiments. 

On the theoretical front, major contribution to the 
field is due to Berg, Winter and von Hippel (BWH) [7]. 
As an outcome of their theory, these authors formulated 
the following nice prediction, partially confirmed by their 
later in vitro experiments [8] : the rate at which proteins 
find their specific target site on DNA depends in a non- 
monotonic fashion on the ionic strength of the solution. 
In this context, ionic strength is believed to tune the 
strength of non-specific adsorbtion of proteins on DNA, 
presumably because a protein adsorbs to DNA via posi- 
tively charged patch on its surface. Thus, in essence one 
should speak of the non-monotonous dependence of the 
rate on the energy of non-specific adsorbtion of proteins 
on DNA. 

Although qualitatively consistent with experiment, 
BWH theory [7] leaves several questions open. First and 
foremost, how does the search time of proteins finding 
their target, or the corresponding rate, depend on the 
DNA conformation? In particular, is it important that 
the DNA is coiled at the length scale larger than the 
persistence length? Is it important that DNA coil may 
not fit in the volume available, and then DNA must be 
a globule, like in the nucleoid in a procaryotic cell in 
vivo or under experimental conditions in vitro [9]? Sec- 
ond, closely related aspect is that BWH theory [7] does 
not answer the experimentally most relevant question [6] 
of the interplay between ID sliding and 3D diffusion. In 
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particular, one of the questions raised by experiments and 
not answered by the BWH theory [7] is about the corre- 
lations between the place where a protein departs from 
DNA and the place where it re-adsorbs. Third aspect, al- 
though of a lesser importance and more taste-dependent, 
BWH theory [7] does not yield simple intuitive expla- 
nation for non-monotonic dependence of the rate on the 
strength of non-specific adsorbtion, and one may want to 
know whether there exists simple qualitative description 
of the rate at least in some limits. 

More recent refinement of the theory is given in the 
work Ref. [10]. The authors of this work follow BWH 
in that they treat DNA in terms of "domains" - a con- 
cept having no unambiguous definition in the physics of 
DNA. Also, the paper Ref. [10] makes it very explicit 
that BWH [7] and subsequent theories neglect correla- 
tions between the place where protein desorbs from DNA 
and the place where it adsorbs again - the approxima- 
tion that clearly defies the polymeric nature and fractal 
properties of DNA. At the same time, this approximation 
leaves unanswered the experimentally motivated question 
of the interplay between ID and 3D components of the 
search process. 

In the recent years, the problem was revisited by physi- 
cists several times [11-13], but the disturbing fact was 
that all of them attributed quite different results and 
statements to BWH: the paper Ref. [11] says that ac- 
cording to BWH, the search time scales as DNA lengths 
L rather than L 2 as in ID diffusion along DNA; the work 
Ref. [12] states that proteins slide along DNA some dis- 
tance which is independent of DNA conformation, re- 
gardless even of the DNA fractal properties; the article 
Ref. [13], although concentrates on the role of the non- 
uniform DNA sequence, claims that the time for 3D dif- 
fusion must be about the same as time for ID diffusion 
along DNA. Further, possibly even more disturbing fact 
is that neither of the papers [7, 10, 11, 13] makes any 
clearly articulated explicit assumption about DNA con- 
formation. Is it straight, or Gaussian coil with proper 
persistence length, or what? Does the result depend on 
the DNA conformation? Interestingly, experimenters do 
discuss in their works (see [6] and references therein) the 
issue of correlated vs. uncorrelated re-adsorbtion, these 
discussions call for theoretical attention and theoretical 
description in terms of correlations in fractal DNA, but 
so far proper theory was not suggested. 

Motivated by these considerations, we in this work 
set out to re-examine the problem from the very be- 
ginning. We explicitly take into account that DNA is 
fairly straight at the length scale smaller than persistence 
length, it is Gaussian coil on the larger length scale. We 
also consider the possibility that DNA is confined within 
such a volume where Gaussian coil does not fit (as it 
does not fit into a typical procaryotic cell, for instance), 
in which case DNA must be a globule. 



B. Model, approach, and limitations 

We assume that within some volume v some (double 
helical) DNA is confined, with contour length L, persis- 
tence length p, and with the target site of the size b. 

We further assume that protein can be non-specifically 
adsorbed on any place of the DNA, and that non-specific 
adsorbtion energy e, or the corresponding constant y = 
e e / fcsT , is the same everywhere on the DNA and does 
not depend on the DNA sequence. We assume that every 
protein molecule has just one site capable to adsorb on 
the DNA. There are proteins with two such sites, they 
can adsorb on two separate pieces of DNA at the same 
time and thus serve as a cross-linker for the DNA itself. 
We do not consider this possibility in this article. 

We assume that there is only one molecule of DNA. In 
reality, macroscopic sample of DNA solution at certain 
concentration is used in any in vitro experiment. From 
the theoretical standpoint, DNA solution with concen- 
tration of 1/v (in units of DNA chains per unit volume) 
is equivalent to the system of one DNA considered here. 
We also assume that DNA has only one target site on it, 
which is not always true in reality [6]. 

We assume that non-specifically bound protein can dif- 
fuse (slide) along DNA with the diffusion coefficient D\, 
while protein dissolved in surrounding water diffuses in 
3D with diffusion constant D 3 . Thus, we have a unit- 
less parameter related to the diffusion coefficients, it is 
d = D1/D3. In the simpler version of the theory, which 
we shall consider first, we assume D\ = D3, or d = 1. 
For simplicity, we assume that while protein is diffusing, 
either in 3D or along the DNA, DNA itself remains im- 
mobile. 

The quantity of our interest is the time needed for the 
target site to be found by a protein (consider, e.g., an 
example of restriction enzyme attacking viral DNA in- 
truder). One should imagine certain concentration c of 
proteins randomly introduced into the system, and ask 
what is the time needed for the first of these proteins 
to arrive to the target site. In this paper, we will only 
address the mean time, averaged over both thermal noise 
and DNA conformation. For this averaged quantity, since 
the DNA is assumed immobile, the problem can be ad- 
dressed in a simple way, by looking at the stationary rate. 
Namely, we should consider that there is a sink of pro- 
teins in the place of the specific target site, and that it 
consumes proteins with the rate J proportional to con- 
centration c, which should be supported on a constant 
level by an influx to maintain stationarity. Obviously 
then, the averaged time is just 1/J. At the end of the 
paper, in section VA we show how to re-derive all our 
results in terms of a single protein, thus avoiding an ar- 
tificial assumption that there is a sink of proteins at the 
place of the target. 

In this article, we calculate the rate J assuming concen- 
tration c an arbitrary constant. In order to compare the 
predicted rate to the Smoluchowski rate J s — AirD^cb, 
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we shall mainly look at the ratio 

J_ J 

J, 



J 



AirDzcb D 3 cb 



(1) 



which characterizes the acceleration of the reaction rate 
achieved due to the sliding along DNA. 

We will be mainly interested in scaling dependence of 
the rate J or acceleration J / J s on major system param- 
eters, such as y, L, and v. In this context, we will use 
symbol "~" to mean "equal up to a numerical coefficient 
of order one", while symbols > and < mean 3> and <C, 
respectively. 

Along with dropping out all numerical coefficients in 
our scaling estimates, we also make several assumptions 
driven by pure desire to make formulae simpler and to 
clarify major physical ideas. We assume that all the "mi- 
croscopic" length scales are of the same order, namely, 
about target size b: protein diameter, double helical DNA 
diameter, and the distance from DNA at which non- 
specific adsorbtion takes place. These assumptions are 
easy to relax. 

Throughout this work we disregard the excluded vol- 
ume of DNA, considering DNA coil as Gaussian and not 
the swollen coil, described by the Flory index 3/5. This is 
a reasonable approximation for most realistic cases [14]. 
Indeed, for many real DNAs, such as, e.g., A-DNA, it is 
justified because of a large persistence length-to-diameter 
ratio of the double helix: excluded volume in the coil re- 
mains unimportant up to DNA length about L < p 3 /b 2 
(up to about 100000 base pairs under normal non-exotic 
ionic conditions). We further assume that the volume 
fraction of DNA inside volume v, which is about Lb 2 /v, 
is sufficiently small even when DNA is a globule. In par- 
ticular, we assume Lb 2 /v < b/p, because in a denser 
system liquid crystalline nematic ordering of DNA seg- 
ments becomes likely [14]. Of course, real nucleoid is a 
rather complex structure involving much more sophis- 
ticated features than just orientational ordering, they 
are caused by structural and other proteins, by entan- 
glements, etc - see the recent experimental work [9] and 
references therein. In this paper we shall touch neither 
of these issues, guided by the prejudice that simple ques- 
tions should be addressed first. 



C. Outline 

The plan of the article is as follows. In section II we 
consider first the relatively simple cases when DNA is 
a Gaussian coil and ID sliding of proteins along DNA 
involves only a small part of DNA length. Already in 
this situation we will be able to explain the effect of cor- 
related re-adsorbtion and arrive at a number of new re- 
sults, such as, for instance, possible asymmetric character 
of the maximum on the curve of the rate as a function 
of adsorbtion strength. These results are also derived 
through the electrostatic analogy in the appendix (B). 
In the section III we present a summary of all possible 



scaling regimes. We then discuss them in more details 
(section IV). We start this by looking at the rate satura- 
tion when ID sliding involves entire DNA length (section 
IV A) . We then consider a delicate case when DNA as a 
whole is a globule (section IV B); in this case, we found 
that even the 3D transport of proteins is in many cases 
realized through the sliding of adsorbed proteins along 
DNA and using DNA as a network of ID transport ways. 
We continue in section IV C by looking at the situations 
when diffusion coefficient of the proteins along DNA is 
either smaller or larger than their diffusion coefficient in 
the surrounding bulk water. In section VA we re-derive 
all our major results using the language of single pro- 
tein search time instead of a stationary process and flux. 
Finally, we conclude with comparison of our results to 
those of earlier works and the discussion of possible fur- 
ther implications of our work (section V) . 



II. SIMPLE CASE: STRAIGHT ANTENNA VS. 
GAUSSIAN COIL ANTENNA 

The reason why non-specific adsorbtion on DNA can 
speed up the finding of target is illustrated in Fig. 1 
(a) and (b): it is because DNA forms a kind of an an- 
tenna around the target thus increasing the size of the 
"effective target". How should we determine the size of 
this antenna? The simplest argument is this. Suppose 
antenna size is £ and contour length of DNA inside an- 
tenna is A. It is worth to emphasize that £ and A do not 
define any sharp border, but rather a smooth cross-over, 
such that transport outside antenna is mainly due to the 
3D diffusion, while inside antenna transport is dominated 
by the sliding, or ID diffusion along DNA. The advan- 
tage of thinking about stationary process is that under 
stationary conditions, the flux of particles delivered by 
the 3D diffusion into the ^-sphere of antenna must be 
equal to the flux of particles delivered by ID diffusion 
into the target. The former rate is given by the Smolu- 
chowski formula (see appendix A) for the target size £ 
and for the concentration of "free" (not adsorbed) pro- 
teins Cf r ee, it is ~ D^c^eeS,. To estimate the latter rate, 
we note that the time of ID diffusion into the target site 
from a distance of order A is about A 2 /£>i; therefore, the 
rate can be written as (Ac a d s ) / (A 2 /-Di), where Ac a d s is 
the number of proteins non-specifically adsorbed on the 
piece of DNA of the length A. Thus, our main balance 
equation for the rate J reads 



J — D 3 Cf lee ^ 



-Pi Cads 

A 



(2) 



Formally, this equation follows from the continuity equa- 
tion, which says that divergence of flux must vanish ev- 
erywhere for the stationary process, flux must be a po- 
tential field. 

Notice that the balance equation (2) depends on the 
relation between £ and A - between the size of antenna 
measured in space (£) and measured along the DNA (A). 
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Here, we already see why fractal properties of DNA con- 
formations enter our problem. 

To determine the one-dimensional concentration of 
non-specihcally adsorbed proteins, c a( js, and concentra- 
tion of proteins remaining free in solution Cf ree , we now 
argue that as long as antenna is only a small part of 
the DNA present, every protein in the system will ad- 
sorb and desorb many times on DNA before it locates 
the target, therefore, there is statistical equilibrium be- 
tween adsorbed and desorbed proteins. Assuming that 
we know the adsorbtion energy e or the corresponding 
constant y = e c / fcfsT , and remembering that adsorbed 
proteins are confined within distance or order b from the 
DNA, we can write down the equilibrium condition as 



^ads/^free^ 



y 



(3) 



which must be complemented by the particle counting 
condition 



Cadsi + Cfree (v - Lb 2 ) 



(4) 



Since volume fraction of DNA is always small, Lb 2 v, 
standard algebra then yields 



Cads 
Cfree 



cvyb 2 ^ ( C yb 2 if y < v/Lb 2 
yLb 2 + v ~ \ cv/L if y > v/Lb 2 ' 

if y < v/Lb 2 



yLb 2 + v X cv / Lb2 V ^ V > V / Lb2 



■ (5) 



Note that at the length scales smaller than persistence 
length p DNA double helix is practically straight, while 
on the length scales greater than p, double helix as a 
whole is a Gaussian coil. That means, if we take a piece 
of double helix of the contour length A, then its size in 
space scales as 



A 



when 
when 



A < p 
A > p 



(6) 



Substituting this result into the balance equation (2), 
we can determine the antenna size and then, automat- 
ically, the rate, the latter being either side of the bal- 
ance equation. We have to be careful, because we see 
that there are already as many as four different scaling 
regimes, due to equations (5) and (6): 

• Regime A - antenna is straight (upper line of Eq. 
(6)), adsorbtion is relatively weak (upper lines in 
the Eq. (5)); 

• Regime B - antenna is Gaussian (lower line in the 
Eq. (6), but adsorbtion is still relatively weak; 

• Regime C - antenna is Gaussian and adsorbtion is 
relatively strong (lower lines in the Eqs. (5)); 

• Regime D - Straight antenna and strong adsorb- 
tion. 



Later we will find plenty more regimes, but now let us 
consider just these ones, one by one. 

To begin with, suppose antenna is straight (A < p, 
so A <~ £, see Fig. 1, (a)) and non-specific adsorbtion 
relatively weak (y < v/Lb 2 , so c a d s ~ cyb 2 ). In this case, 
balance equation yields A ~ b(yd) 1 ^ 2 , or for the rate 



J ~ c^D 3 D iy 1/2 b 



(7) 



in other words, for the ratio of this rate to the Smolu- 
chowski rate J s ~ D^cb, we obtain 



— ~ (yd) 1 / 2 (regime A). 

J s 



(8) 



This result remains correct as long as antenna remains 
shorter than persistence length, and since we know A, we 
obtain this condition explicitly: y < p 2 /b 2 d. 

Let us now suppose that non-specific adsorbtion is still 
relatively weak (y < v/Lb 2 , so c a d s ~ cyb 2 ), but it is 
strong enough such that antenna is longer than persis- 
tence length (A > p, so that £ <~ \/Ap, see Fig. 1, (b)). 
Then our balance equation yields A 
or 



M 2/ V 1/3 & 4/3 



j 



ypd 
b 



1/3 



(regime B). 



(9) 



One should check that this new result for A implies that 
A > p at y > p 2 /b 2 d, and so y ~ p 2 /b 2 d is the cross-over 
line between the two regimes, A and B. In both regimes, 
and as expected, the rate grows with the strength of non- 
specific adsorbtion, y, because increasing y increases the 
size of antenna. However, the functional scaling depen- 
dence of the rate on y is significantly different, reflecting 
the difference in DNA fractality at different length scales. 

Before we proceed with analysis of other scaling 
regimes, it is useful to make the following comment. The 
balance equation (2) describes the fact that every pro- 
tein going through the 3D diffusion far away must then 
also go through the ID diffusion closer to the target. 
In other words, balance equation (2) describes the self- 
establishing match between 3D and ID parts of the pro- 
cess. But we can also look at the situation differently: 
suppose that one particular protein is adsorbed on DNA 
in a random place, and let us estimate the distance it 
can diffuse along DNA before it desorbs due to a ther- 
mal fluctuation. Since probability of thermally activated 
desorbtion is proportional to e ~ e / kBT = 1/y, the time 
protein spends adsorbed must be about b 2 y/D 3 . During 
this tim e, protein diffuses along DNA by the distance 
about ^D 1 b 2 y/D 3 = b^fyd. Following [10, 12], we call it 
sliding distance. We see, therefore, that antenna length 
A is just about sliding distance for the straight DNA, but 
A 3> 4siide for the coiled DNA. This seems for the first 
glance like a very weird result: how can possibly be an- 
tenna longer than the distance over which protein can 
slide? In fact antenna does become longer than the bare 
sliding distance, and this happens because for the coiled 



FIG. 1: Antenna in a variety of cases. The upper part of every figure represents a poor man's idea of a prokaryotic cell. In 
figures a and b, DNA in the cell is a coil, because coil size R is smaller than the cell dimension; alternatively, one can think 
of dilute solution of DNA in which R is much smaller than the distance to other coils (not shown) . In figure c, the amount of 
DNA is so large, that the coil size would have exceeded the cell diameter, and so DNA is a globule; alternatively, one can think 
of a semi-dilute solution [15] of strongly overlapping DNA coils. The lower figures represent blow up view of the region around 
the target site on DNA. The antenna part of DNA around the target is shown in lighter color than the rest of DNA. The 
space region below the crossover length scale is shadowed. This space region is roughly spherical in cases a and b, it is sausage 
shaped in case c. Figure a also shows the averaged flow lines of the diffusion, which go in 3D far away from the target and 
go mostly along DNA within antenna length scale (they are equivalent to electric field lines in terms of electrostatic analogy, 
Appendix B). In figures b and c flow lines are not shown, simply because it is difficult to draw them. In figure c, we see that 
DNA globule locally looks like a temporal network, with the mesh size r. In this case, antenna might be much longer that one 
mesh. In the figure, mesh size is not larger than persistence length, so the length of DNA in the mesh g is about the same as 
r; at lesser density, mesh size might be longer, and then DNA in the mesh would be wiggly, with g 3> r. 



DNA every protein, desorbed after sliding the distance of 
the order of £ s ndc, has a significant chance to re- adsorb 
nearby. Such correlated re-adsorbtion gets more likely 
as we consider more and more crumpled conformations 
of DNA. Indeed, if we in general assume that £ ~ X v , 
then balance equation yields A ~ y 1 /0-+ v ) } which means 



that A grows with y faster than 4iide 



y 



1/2 



at every 



v < 1. This growth of A with y gets increasingly fast as 
v decreases, which corresponds to more crumpled con- 
formations. We should emphasize that this mechanism 
of correlated re-adsorbtion is impossible to see as long 
as DNA polymeric and fractal properties are not con- 
sidered explicitly, that is why this mechanism has been 
overlooked in previous works. 

With further increase of either non-specific adsorbtion 
strength y or DNA overall length L, we ran into the sit- 
uation when most of the proteins are adsorbed on the 
DNA. In other words, if one prefers to think in terms of a 



single protein diffusion, then this single protein molecule 
spends most of the time adsorbed on DNA far away from 
the target. For this case, we have to use the lower lines of 
the formulae (5) and substitute it into the balance equa- 
tion (2). Since equilibrium condition (3) is still satisfied, 
the result A£ ~ ydb 2 remains unchanged. Depending on 
whether antenna length A is longer or shorter than per- 
sistence length, we obtain the regimes C and D. 

For regime C, we have A > p, antenna is a Gaussian 
coil and £ ~ VAp> yielding A ~ (j/d) 2 / 3 p _1 / 3 6 4 / 3 and 



J 

X 



v(pd) 



1/3 



L6 7 /3 y 2/3 



(regime C). 



(10) 



Given our expression for A, the condition A > p implies 
the familiar y > p 2 /b 2 d, and another condition for this 
regime is that most proteins are adsorbed, or y > v / Lb 2 , 
see Eqs. (5). 

For regime D, antenna is straight, so £ ~ A, and we 
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get A ~ bd/d) 1 / 2 , just as in the regime A. For the rate 
however substitution of lower lines of the Eqs. (5) into 
the balance equation (2) yields 



J_ 

X 



vd 1 ' 2 
LPy 1 / 2 



(regime D) 



(11) 



According to our discussion, this regime should exist 
when y < p 2 /b 2 d and y > v/Lb 2 . As we shall see later, 
in the section IV C, these two conditions can be met to- 
gether and the room for this regime exists only if d < 1, 
which means when ID diffusion along DNA is slower than 
3D diffusion in space. 

In both regimes C and D, overall rate decreases with 
the increase of non-specific adsorbtion, y, because 3D 
transport to the antenna is slowed down by the lack of 
free proteins. 

We have so far discussed four of the scaling regimes, 
our results are equations (8), (9), (10) and (11). Already 
at this stage, we gained simple understanding of the non- 
monotonic dependence of the rate on y - phenomenon for- 
mally predicted in [7] and observed in [8] , but previously 
not explained qualitatively: at the beginning, increasing 
y helps the process because it leads to increasing antenna 
length; further increase of y is detrimental for the rate 
because it leads to an unproductive adsorbtion of most 
of the proteins. We have also obtained a new feature, 
absent in previous works: the shape of the maximum on 
the J(y) curve is asymmetric, at least if DNA is not too 
long: in the regimes B and C, rate grows as y 1 ^ 3 and then 
falls off as y~ 2 l z '. 

Since there are quite a few more scaling regimes, it is 
easier to understand them if we now interrupt and offer 
the summary of all regimes as presented in Figure 2 and 
Table I. 



III. SUMMARY OF THE RESULTS: SCALING 
REGIMES 

Our results are summarized in Fig. 2 and in the Table 
I. Figure 2 represents the log-log plane of parameters 
L and y, and each line on this plane marks a cross-over 
between scaling regimes. This figure gives the diagram of 
scaling regimes for the specific case d = 1 (or D\ = D 3 ); 
later on, in the section IV C we will return to the more 
general situation and present corresponding diagrams for 
both d < 1 and d > 1 cases. 

To be systematic, let us start our review of scaling 
regimes from the two trivial cases, which correspond to 
the axes in Fig. 2. When y < 1, there is no non-specific 
binding of proteins to the DNA, and no sliding along 
DNA. Proteins find their specific target at the rate which 
is equal to the Smoluchowski rate, or J / J s = 1. Similarly, 
if the DNA is very short, as short as the specific target 
site itself, or L ~ b, then once again J/J s = l for trivial 
reason. Since we assume that there is some non-specific 
adsorbtion, or y > 1, and since DNA length is obviously 
always greater than the target size b, our diagram in Fig. 
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FIG. 2: Diagram of scaling regimes for the case d = 1, when 
diffusion along DNA has the same diffusion constant as diffu- 
sion in surrounding water. Both L and y axes are in the log- 
arithmic scale. When DNA is shorter than persistence length 
(b < L < p) DNA is essentially a rod, DNA is a Gaussian coil 
as long as it is longer than persistence length, but coil size 
is smaller than the restriction volume v (p < L < v 2 ' 3 /p), 
DNA is globular at L > v 2 ^ 3 /p, and we only consider L up to 
about v/pb, because at larger L DNA segments start forming 
liquid crystalline order. Summary of the rates for each regime 
is found in Table I. Here, as well as in the other figures, to 
make formulae look shorter, all lengths are measured in the 
units of b, meaning that L, p, and v stand for L/b, p/b, and 
v/b 3 . 



2 presents only the y > 1 and L/b > 1 region, which is 
why pure Smoluchowski regime is seen only on the axes. 

If we increase y and consider y > 1 situation, then 
we have significant non-specific adsorbtion of proteins on 
DNA, which increases the rate due to the antenna effect. 
If y remains moderate, the antenna is shorter than DNA 
persistence length, it is straight. This is regime labelled 
A in Fig. 2 and described by formula (8). With further 
increase of y, when y > p 2 /b 2 d, we cross-over into the 
regime labelled B and described by formula (9), in this 
regime antenna is so long that it is a Gaussian coil. From 
the regime B, we can cross over the line y = v/Lb 2 and 
get into the regime labelled C and described by the for- 
mula (10). One can cross-over into the regime C by cither 
increasing y or increasing L, because increasing either of 
these variables promotes unproductive non-specific ad- 
sorbtion of proteins on far away pieces of DNA and thus 
slows down the transport to the specific target. 

From regime A, we can also cross over the line y = 
v/Lb 2 , but as long as d = 1 this does not bring us to the 
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TABLE I: The summary of rates and antenna lengths in various regimes. In labelling regimes, we skip J and L to avoid 
confusion with the rate and DNA length. 



Regime 


Description 


J/Js 


A 


Axes 


Smoluchowski: no antenna 


1 


6 


A 


straight antenna, few proteins adsorbed 


{yd) 1 ' 2 


b{ydf' 2 


B 


coiled antenna, few proteins adsorbed 


(ypd/b) 1/3 


(j/d) 2/3 p-V3 6 4/3 


C 


coiled antenna, most proteins adsorbed 


v(pa) ' 
if) 7/3„2/3 


(yd) 2/3 p- 1/3 b 4/3 


D (d < 1) 


straight antenna, most proteins adsorbed 


vd 1 


6(»d) 1/a 


E 


whole DNA as straight antenna, few proteins adsorbed 


L/b 


L 


F 


whole DNA as coiled antenna, few proteins adsorbed 


(L P /b 2 Y /2 


L 


G 


whole DNA as antenna, most proteins adsorbed 


vd 

^ TTP 


L 


H 


antenna with coiled mesh, most proteins adsorbed 


p / vd\ 1 / 2 


b fvyd\ 1 / 2 


I 


antenna with straight mesh, most proteins adsorbed 


Lb^y 1 / 2 


b(yd) 1/2 


K (d> 1) 


antenna with straight mesh, few proteins adsorbed 


(yd) 1/2 


b(yd) 1/2 


M (d> 1) 


antenna with coiled mesh, few proteins adsorbed 




b ^vyd^ 1 / 2 



regime D, instead we get to the new regime labelled I, 
which we will explain a few lines below. 

To understand all other scaling regimes, we have to re- 
member that our previous consideration throughout Sec- 
tion II was restricted in two respects. First, we assumed 
that the entire DNA in the form of Gaussian coil fits 
within volume v, which is true only as long as L < v 1 ' 3 
and \fljp < v 1 ' 3 , where v 1 ' 3 stands for the linear dimen- 
sion of the restriction volume. To relax this assumption, 
we will have to consider a long DNA which is many times 
reflected by the walls of volume v and inside volume v 
represents a globule, locally looking like a semi-dilute so- 
lution of separate DNA pieces, as illustrated in Fig. 1 
(c). For such long DNA, we shall find two more regimes 
labelled H and I in Fig. 2. Second, we assumed that the 
antenna length A was smaller than full DNA length L; 
the consequence of this was our statement (3) that there 
is equilibrium between adsorbed and dissolved proteins. 
Relaxing this assumption, we will have to discuss regimes 
labelled E, F, and G on Fig. 2. 

In Figure 3, we present a schematic y-dependence of 
the rate for a number of values of DNA lengths L. Each 
curve is labelled with the corresponding value of L. To be 
specific, we have chosen the lengths which correspond to 
various cross-overs and are marked on the scaling regimes 
diagram, Figure 2. Note that in many cases our result for 
the rate exhibits a maximum and saturation beyond the 
maximum - features first described in the work BWH, 
Rcf. [7]. Unlike BWH, we find that the maximum is 
asymmetric and, even more importantly, J/J s can be- 
come much smaller than unity, i.e., one can observe decel- 
eration in comparison with Smoluchowski rate. We also 
find a number of other features, such as specific power 
law scaling behavior of the rate. 

Thus, we have to discuss one by one all the new regimes 
E, F, G, H, I. This is what we do in the next section IV. 



J 



v 1/3 



vd 
dp 4 







(vd) 2 ' 5 

p 1/5 






P 












v 2/3 * 












vd 

P 2 

V 






V p 2 



FIG. 3: Schematic representation of rate dependence on y. 
Both the rate J and y are given in logarithmic scale. The 
fraction next to each curve shows its slope, which is the power 
of J(y) dependence. Each curve corresponds to the specified 
value of DNA length L, also indicated in Figure 2, the length 
L is shown above the right end of each curve. Experimentally, 
the value of y can be controlled through the salt concentra- 
tion, because non-specific adsorbtion of proteins is controlled 
by Coulomb interaction between negative DNA and positive 
patch on the protein surface; for instance, if the salt is KC1, 
then it is believed [8, 11] that y = 10 [KC1] + 2.5, where [KC1] 
is the molar concentration of the salt. Note that we recover 
the possibility, first indicated in [7] , that the rate goes through 
the maximum and then saturates, but in our case maximum 
is in many cases asymmetric, while at large y the rate be- 
comes very small J/ J s -C 1, particularly for long DNA. Here, 
as well as in the other figures, to make formulae look shorter, 
all lengths are measured in the units of b, meaning that L, p, 
and v stand for L/b, p/b, and v/b . 
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IV. SYSTEMATIC CONSIDERATION OF 
SCALING REGIMES 

A. DNA is not long enough for full antenna 

If DNA is too short for antenna, then proteins al- 
ready adsorbed on DNA can find their target faster than 
new proteins can be delivered to the DNA from solu- 
tion. There is no adsorbtion equilibrium any longer, and 
instead of formula (3) we can only claim that c ac i s < 
ycf lcc b 2 . Therefore, the amount of adsorbed proteins un- 
der stationary conditions is physically determined by the 
stationarity itself, which means, we have to look at for- 
mula (2) as two equations. In doing so, we have to replace 
A in the right hand side (one-dimensional rate) by L, be- 
cause we don't have more DNA than L, and we have to 
replace £ in the left hand side, which is the antenna size 
for 3D transport, by R - overall size of DNA coil. Of 
course, particle counting equation (4) is still valid, it is 
the third equation. Thus, our equations read: 



J_ 

^irccR 



Cfr 



?R 



Cb ' 
Cads^ 



CadsL + Cf ICC V — CV . 



From here, we find 



J_ 

J s 



vRd/b 
RIP- + vd 



(12) 



(13) 



We can now easily address all possible scaling regimes in 
which antenna is longer than DNA. 

To begin with, it is possible that DNA length is shorter 
than DNA persistence length L < p, such that the entire 
DNA is essentially straight, and then R~ L. Assuming 
also L 3 < v, we arrive at the scaling regime labelled E in 
Fig. 2, in this regime 



Y ~ ^ (regime E). 



(14) 



The borderline of this regime can be established from the 
condition that since entire DNA is smaller than "equi- 
librium" antenna, we must expect that c a( j s is smaller 
than its equilibrium value, or c a ds/cf re e& 2 < V- Since 
according to the second of the formulae (12) we have 
c ads/cfroo = LR/d, so we have the condition LR/d < yb 2 ; 
at L < p this yields y > L 2 /b 2 d. At the same condition 
we can also arrive from the other side of the crossover, 
by noting that regime A continues as long as antenna is 
shorter than entire DNA, A < L; using our result for A 
for the regime A, this produces the same cross-over line 
between regimes A and E. 

For longer DNA, when L > p, entire DNA is Gaussian 
coil, its size is R ~ (Lp) 1 / 2 . Still assuming that the 
second term dominates in the denominator in formula 



(13), we arrive at 
J_ 



Lp 
b 2 



1/2 



(regime F). 



(15) 



This regime is labelled F in Fig. 2. Its borderline with 
regime E is obviously vertical line L = p. As regards 
cross-over to the regime B, once again it can be estab- 
lished either from c a d s /cf r0 c = LR/d < y for the regime F 
or from A < L for the regime B. In either way we arrive 
at the cross-over condition y = L 3 ^ 2 p 1 ^ 2 jb 2 d. 

For even longer DNA, the antenna length becomes 
equal to the length of entire DNA only at so large y, 
that the system is already in the regime C, with rate 
falling down with increasing y because of the unproduc- 
tive adsorbtion of proteins. Since antenna length A in the 
regime C is given by the same formula as in the regime 
B, so the upper border line of the regime C is the contin- 
uation of the corresponding line bordering regime B, it is 
y = Z/ 3 / 2 ^ 1 / 2 /b 2 d. However, when we cross this line up- 
wards from the regime C, we arrive at the new situation, 
because now the first term dominates in the denominator 
of the equation (13), meaning that most of the proteins 
are adsorbed on DNA, such that we obtain 



x ~ m (rcgimc G) - 



(16) 



The cross-over between this regime and regime F is ver- 
tical line at which both terms are comparable in the 
denominator of equation (13), it is L = (vd) 2 / 5 /p 1 / 5 . 
Crossover line with the regime C can once again be es- 
tablished from the condition c a d s /cf roo = LR/d < y. 

In all regimes E, F, and G the rate saturates with in- 
creasing y. For the regimes E and F this happens after 
just initial growth of rate; for the regime G saturation 
occurs after rate goes through the maximum and starts 
decreasing. In all cases saturation is due to the fact that 
increasing adsorbtion strength does not lead to any in- 
crease of the antenna size, because already the entire 
DNA is employed as antenna and antenna has nowhere 
to grow. 



Cell is not big enough to house DNA Gaussian 
coil 



When DNA is very long for a given volume, specifi- 
cally, when (Lp) 1 / 2 > w 1 / 3 , DNA cannot remain just a 
coil, it must be a globule, as it is forced to return many 
times back into the volume after touching the walls (see, 
for instance, [14]). For the purposes of this work, it is suf- 
ficient to keep assuming that excluded volume of DNA is 
not important, because volume fraction of DNA within 
confinement volume v is still small, and even small com- 
pared at b/p. Nevertheless, the system locally looks like 
a so-called semi-dilute solution of DNA, or transient net- 
work with certain mesh size (see Figure lc). 
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We should remind some basic facts regarding the semi- 
dilute solution of transient network [14, 15]. Let us de- 
note r the characteristic length scale of a mesh in the 
network, it is in the scaling sense the same as the charac- 
teristic radius of density-density correlation (see Figure 
lc). Let us further denote g the characteristic length 
along the polymer corresponding to the spatial distance 
r. Quantities r and g can be estimated from the following 
physical argument [14, 15]. Consider a piece of polymer 
of the length g starting from some particular monomer, 
it occupies region ~ r 3 and makes density about ~ g/r 3 ; 
this density must be about overall average density, which 
for our system is of the order of L/v. Thus, g/r 3 ~ L/v. 
Second relation between g and r is similar to formula (6), 
it depends on whether mesh size is bigger or smaller than 
persistence length p: 



g if g < p 

Vgp if g > p 



(17) 



Accordingly, we obtain after some algebra 



L 

v 2 
L 2 p 3 



L 

V 

Lp 



if 
if 



v 2 ' 3 
P 



L > \ 



(18) 



The upper line corresponds to the network so dense that 
every mesh is shorter than persistence length and poly- 
mer is essentially straight within each mesh. The lower 
line describes much less concentrated network, in which 
every mesh is represented by a little Gaussian coil. 

Returning to our problem, we should realize that the 
antenna length A can in fact be longer than the mesh size 
g, as illustrated in Fig. 1 (c). To estimate the antenna 
size for this case, we should remember that desorbtion 
from antenna does not necessarily completely breaks the 
sliding along DNA, because protein can still re-adsorb on 
a nearby place of DNA, more generally - on a correlated 
place on DNA. To account for this, let us imagine that 
the antenna part of DNA is decorated by a tube of the 
radius r. Since r is the correlation length in the DNA so- 
lution, protein remains correlated with antenna as long 
as it remains within this tube around antenna. Accord- 
ingly, our main balance equation (2) must be modified 
to account for the fact that 3D transport on the scales 
larger than r is now realized through DNA network and, 
therefore, the task of regular 3D diffusion is only to de- 
liver proteins over the length scale of order of one mesh 
size r, into any one of the X/g network meshes along the 
antenna. The rate of delivery into one such mesh would 
be ~ Z?3Cf ree r, so overall delivery rate into the antenna 
tube scales as <~ D 3 Cf Icc rX/g. As usual, this must be 
equal to the rate of ID delivery along antenna into the 
specific target, so instead of (2) we finally get 



J ~ D 3 Cf Tee r- ~ D 1 ^- . 

9 A 



(19) 



relation between Cf ree and c a( j s equilibrates and obeys (3- 
5), so we finally get 



A 2 ~6 2 ^ 



and 



J_ 



Cfree r\ 



V 

Lb 2 



1/2 



(20) 



(21) 



What is nice about this formula is that it remains correct 
in a variety of circumstances - when antenna is straight 
(A < p), or antenna is Gaussian (p < X < v 2 / 3 /p), or 
antenna is a globule (A > v 2 ! 3 /p). 

Taking r and g from the formulae (18), we finally ob- 
tain two new regimes. When every mesh is Gaussian, 



J_ 



p / vd 

V 2 \n 



1/2 



(regime H). 



(22) 



This regime borders regime C along the line where an- 
tenna size is equal to the mesh size, A = g, which reads 
y = v 3 /(L 3 p 4 b 2 d). Regime H also borders regime G along 
the line where antenna size is as long as the entire DNA, 
A = L, or y = L 3 p 2 /vb 2 d. Finally, regime H also bor- 
ders another regime I along the vertical line L — v/p 2 , 
which corresponds to DNA within every mesh becom- 
ing straight (shorter than persistence length). For this 
regime, we have to use upper line in formulae (18), thus 
obtaining 



J_ 



vd 1 ' 2 
Ltfy 1 / 2 



(regime I). 



(23) 



As long as antenna is shorter than the entire DNA, the 



This regime borders saturation regime G along the line 
y = L 2 /b 2 d where A = L. 

As regards the lower border of the regime I, it corre- 
sponds to the situation when antenna becomes straight, 
which happens at y = v/Lb 2 d. However, as long as d = 1, 
which is the case presented in Figure 2, this line coincides 
with the line y = v/Lb 2 below which most proteins arc 
desorbed and free in solution. That is why at d = 1, there 
is no room for the regime D, in which antenna is straight, 
but most proteins adsorbed. Indeed, when d = 1, then 
3D transport is mostly realized by sliding along the net- 
work edges as soon as most proteins are adsorbed, which 
precisely means that regime A crosses over directly to 
regime I. 

As we see, in both H and I regimes the rate J decreases 
with growing y, but does so slower than in the regime C, 
only as y~ x ^ 2 instead of y~ 2 ^ 3 '. This happens because ad- 
sorbed proteins are not just taken away from the process, 
as in the regime C, but they participate in 3D transport 
through the network, albeit this transport is still pretty 
slow. 

This completes our scaling analysis for the d = 1 case 
shown in Fig. 2. 
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C. Diffusion rate along DNA is different from that 

in surrounding water 

Let us now relax the d — 1 condition and examine the 
cases when diffusion along DNA is either slower (d < 1) 
or faster (d > 1) than in surrounding water. 

First let us consider d < 1 case, when diffusion along 
DNA is slower than that in the surrounding water (Di < 
D 3 ), corresponding scaling regimes are summarized in 
the diagram Figure 4. Most of the diagram is topologi- 
cally similar to that in the Figure 2, and we do not repeat 
corresponding analysis. Of course, there are now powers 
of d in all equations, but the major qualitative novelty 
is that there is now a room for the regime D sandwiched 
between regimes A and I. The formal reason why this 
regime now exists in a separate region is because the 
line y = v/Lb 2 d goes above the line y = v/Lb 2 . To 
understand the more meaningful physical difference, let 
us recall that the line y — v/Lb 2 marks the cross-over 
above which most of the proteins are adsorbed, but it is 
not enough for the sliding-along-network mechanism to 
dominate in the 3D transport at d < 1. 

Interestingly, the rate for both regimes D and I is given 
by the same formula - compare Eqs. (11) and (23). This 
happens because antenna is straight for the regime D 
and, while antenna is not straight for the regime I, it still 
consists of a number of essentially straight pieces, each 
representing one mesh. The major difference between 
regimes D and I, despite similar scaling of the rate, is 
in the mechanism of diffusion: in the regime D, proteins 
diffuse through the water in a usual manner, while in the 
regime I they are mostly transported along the network of 
DNA, with only short "switches" on the scale of one mesh 
size r between sliding tours. This is why straight pieces 
of DNA in different meshes independently add together 
to yield the same overall formula for rate as in the regime 

D. ' 

Let us now switch to the opposite limit and consider 
the d > 1 case, for which the results are summarized 
in Figure 5. This diagram is quite similar to the previ- 
ously considered ones in Figures 2 and 4, except there 
are now two new regimes labelled K and M (in alpha- 
betical labelling of the regimes we skip J and L to avoid 
confusion with rate and DNA length). These regimes 
are both below the line y = v/Lb 2 , which means, most 
of the proteins are not adsorbed. However, since d > 1, 
the new physical feature of the situation is that adsorbed 
proteins, although they are in minority, can nevertheless 
dominate in 3D transport by sliding along DNA network, 
because sliding is now so fast at d > 1. Thus, regimes 
K and M are the ones in which effective diffusion along 
DNA network dominates, so we have to use formula (19) 
for the rate and antenna size, while for the concentrations 
of free and adsorbed proteins we have to use upper lines 
in the formulae (5). In the regime K, local concentration 
of DNA segments is so high, that every mesh in DNA 
network contains an essentially straight piece of DNA, so 
we have to use the upper line in formula (18), yielding 



(after some algebra) 

^- - (yd) 1/2 (regime K). (24) 

Similarly, in the regime M mesh of the DNA network is 
Gaussian, we have to use lower line in equation (18), and 
this produces 

T~ P {~v~~) (regime M). (25) 

Since the majority of proteins are not adsorbed, it is not 
surprising that rate grows with y in both regimes K and 
M. Notice that the rate is given by the same formula for 
the regimes A and K - compare (8) and (24). This is sim- 
ilar to the situation with regimes D and I, as discussed 
before, because although rate is given by the same for- 
mula, the underlying diffusion mechanism is fundamen- 
tally different. In both cases of D and I or A and K, it 
is possible that although scaling laws are the same, the 
numerical pre- factors are different. 

It is also interesting to note that the cross-over between 
regimes B and M takes place on the line y — v 3 /p 4 L 3 b 2 d 
where antenna length is equal to the DNA length in one 
mesh: on the side of B regime, antenna is shorter than 
one mesh, and transport to antenna must be through 
water; on the side of M, antenna is longer than one mesh, 
and effective transport along DNA network is at play. 

D. Maximal rate 

To finalize our discussion of scaling regimes, it is rea- 
sonable to ask: what is the maximal possible rate? Ac- 
cording to our results, the maximal rate is achieved 
on the border between regimes F and G, that is, at 
L ~ {vdf^/p 1 / 5 and at y > v^p 1 / 5 /b 2 d 2 / 5 . Maximal 
possible acceleration compared to Smoluchowski rate is 
about (vp 2 d/b 5 ) 1 / 5 . It is interesting to note that the "op- 
timal strategy" in achieving the maximal rate at the min- 
imal possible y requires to have the adsorbtion strength y 
right at the level at which the probability of non-specific 
adsorbtion for every protein is about 1/2 (on the line 
V ~ v/L). 

It is interesting that the maximal possible acceleration 
grows with overall volume v, which may seem counterin- 
tuitive. This result is due to the fact that total amount 
of DNA grows with increasing v, and, according to our 
assumption, all this DNA has still just one target. 

V. DISCUSSION 
A. Single protein view 

Many of the previous theoretical works [10-13] looked 
at the situation in terms of a single protein molecule dif- 
fusing to its target. In this view, one should imagine that 
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FIG. 4: Scaling regimes for the case d < 1. The major difference from the d = 1 case is the presence of regime D, in which 
majority of proteins are adsorbed, but still the dominant 3D transport is the usual diffusion through the surrounding water, 
because sliding along DNA is too slow (Di < D3). In this figure, as well as in the other figures, to make formulae look shorter, 
all lengths are measured in the units of b, meaning that L, p, and v stand for L/b, p/b, and v/b 3 . 



a protein molecule is initially introduced into a random 
place within volume v, and then one should ask what is 
the first passage time [16] needed for the protein to ar- 
rive to the specific target site on DNA. The mean first 
passage time r can of course be found using our results 
for the rate J by inverting the value of the rate and as- 
suming that on average there is just one protein molecule 
in the system at any time: r = ^/ J\ c= i/ V - However, we 
want to re-derive all our results directly in terms of r in 
order to build bridges to the works of other authors. The 
re-derivation turns out also quite illuminating. 

First let us consider that DNA is a globule, L > v 2 ^ 3 /p 
(or semi-dilute solution), and look at the regimes H, I, 
K, and M; unlike stationary diffusion approach above, in 
the single protein language the derivation for the glob- 
ular DNA case is actually simpler. Following [13], we 
imagine that the search process for the given single pro- 
tein consists of tours of ID sliding along DNA followed 
by diffusion in 3D, followed by ID sliding, etc. If in one 
tour of ID sliding protein moves some distance A along 
DNA, then it takes time about A 2 /Di. The length A here 



is, of course, our familiar antenna length, but we will re- 
derive it here, so we do not assume it known. As regards 
the tour of 3D diffusion, it breaks correlation of the ID 
sliding if it carries protein over a distance larger or about 
the correlation length in the DNA system, which is r - 
mesh (or blob) size. Thus, the longevity of one tour of 
3D diffusion is about r 2 /D3. 

The next step of our argument is this. On its way 
to the target, the protein will go through great many 
adsorbtion and de-sorbtion cycles, therefore, the ratio 
of times protein spends adsorbed and de-sorbed should 
simply follow equilibrium Boltzmann statistics: 



A 2 /L>i yLb 2 



2 /D 3 



(26) 



(Here, we note parenthetically that there is an approxi- 
mation underlying our argument: one tour of "correlated 
ID sliding" does include small 3D excursions of the pro- 
tein into water, but they are small in the sense that they 
do not go beyond the cross-over correlation distance and, 
therefore, re-adsorbtion after excursion occurs on a corre- 
lated place on DNA. Accordingly, these excursions make 
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FIG. 5: Scaling regimes for the case d > 1. The major new feature of this diagram compared to previous ones is the presence of 
regimes K and M. In these regimes the majority of proteins are not adsorbed, but still the dominant 3D transport mechanism 
is the sliding of minority proteins along DNA network, because it is so much faster (Di > D3). We skip J and L in labelling 
regimes to avoid confusion with rate J and DNA length L. In this figure, as well as in the other figures, to make formulae look 
shorter, all lengths are measured in the units of b, meaning that L, p, and v stand for L/b, p/b, and v/b 3 . 



only marginal contribution to the sliding time which is 
correctly estimated as ~ X 2 /Di.) 

The final part of the argument is most clearly formu- 
lated by Bruinsma in the work ref. [11]: since subsequent 
tours of ID sliding occur over uncorrelated parts of DNA, 
full search requires about L/ X rounds. Therefore, the to- 
tal search time r can be written as 

t ~x[d[ + di\- (27) 

Equations (26) and (27) solve the problem for all 
regimes of globular DNA if we remember that mesh (or 
blob) size r is given by the formula (18). Notice that for- 
mula (26) gives a new interpretation to the line y ~ v/Lb 2 
on any of our diagrams Fig. 2, 4, 5: for the parameters 
below this line most of the overall search time is spent in 
3D diffusion, while for the system with parameters above 
the line the major time consuming part is ID sliding. It 
is close to this line where the result of the work ref. [13] 
applies and these two times are of the same order. And 



let us remind that it is also close to this line where the 
maximal possible rate is achieved (see section IV D). 

Thus, four regimes H, I, K, and M result from two 
possibilities for r in Eq. (18) (straight or Gaussian DNA 
within a mesh) and two possibilities of either first or sec- 
ond term dominance in formula (27). 

Let us now turn to the regimes A, B, C, and D, when 
DNA is a coil. In this case, we still essentially rely on the 
equations similar to (26) and (27), except some effort is 
now needed to understand the time of 3D diffusion. Our 
argument for this case starts from noticing that there is 
a cross-over spatial scale £, such that correlated sliding 
takes place inside scale £, while regular 3D diffusion in 
water occurs on a larger length scale, as it breaks correla- 
tions between desorbtion and subsequent rc-adsorbtion. 
Thus, the time of one tour of 3D diffusion is the mean 
first passage time into any one of the L/X balls of the 
size £ (here A is the contour length of DNA accommo- 
dated by one ball of the size £; once again, we pretend 
that we do not know £ and A, we will re-derive them in 
this single-protein language). The arrival time into one 
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such ball is the Smoluchowski time (discussed in the ap- 
pendix A) for the target of size £, it is about v/D 3 £; the 
arrival time into any one of the L/X balls is L/X times 
smaller: <~ v /D 3 ^(L/ X) In order to present our equa- 
tions for A and overall search time r in the form similar 
to Eqs. (26) and (27), we define distance r e ff such that 



'off 



and 



£>3 [v /D 3 £(L/X)] = vX/L^ and then we obtain 



X 2 /D 1 yLb 2 



'off 



/D 3 



L 
A 



„2 1 



(28) 



(29) 



Once again, remembering two regimes for the relation 
between A and £, formula (6), and having either first or 
second term dominate in the total time (29), we recover 
four regimes A, B, C, and D. 

Finally, the results for all saturation regimes E, F, G 
are recovered by replacing the antenna length A with L 
in equation (27) or (29), and replacing equality with in- 
equality in the conditions (26) or (28). 



B. Comparison with earlier theoretical works 

Let us now compare our findings with various state- 
ments found in the literature. The most widely known 
result of the classical work [7] was the prediction, later 
confirmed experimentally [8], that the rate depends on 
y (controlled by ionic strength) in a characteristic way, 
exhibiting a maximum followed by a plateau. We have 
recovered this as a possible scenario for some combina- 
tions of parameters (regimes), as shown in Fig. 3. How- 
ever, we found also a number of additional features not 
noticed previously: first, the maximum is in many cases 
asymmetric; second, the scaling of rate dependence on 
y exhibits rich behavior, with the possibilities of cross- 
ing over from y 1 / 2 to y 1 ^ 3 on the way to the maximum, 
or from y~ 2 ^ 3 to y^ 1 ^ 2 on the way down; third, there is 
a possibility of very strong deceleration at large adsorb- 
tion strength y compared at the Smoluchowski rate. All 
these features have simple qualitative explanation: the 
rate grows because increasing y increases the antenna; 
the rate decays when most of the proteins are fruitlessly 
adsorbed far from target (or, in other language, every 
protein spends most of the time adsorbed far away); the 
rate saturates and comes to the plateau because antenna 
becomes as long as the DNA itself. All of these features 
are the direct consequence of the fractal properties of 
DNA, in cither coil or globule state. 

The work Rcf. [11] represents a review of a variety of 
topics related to protein-DNA interactions, and the issue 
of search rate is considered only briefly. In the context, 
the work Ref [11] provides an important insight, used 
above in presenting the formula (27), that subsequent 
rounds of ID search are performed on uncorrelated pieces 



of DNA. In other words, there exists a cross-over from 
mostly correlated events, earlier combined into one "cor- 
related sliding length A" , to mostly uncorrelated ones. In 
accord with this insight, the search time is linear in DNA 
length in the regime I. 

In the paper Ref. [12] antenna length was explicitly 
identified with the sliding distance (that is, with the 
bare sliding distance, earlier in this paper denoted as 
4iido ~ b^/yE), and then essentially formula (27) was 
used to determine the search time. This approach is per- 
fectly valid as long as the antenna is straight, A = £, 
and A = ^ s iidc 7 it predicts the symmetric maximum of 
J(y) dependence, but it should not be used when DNA 
antenna is coiled. For the globular DNA, the approxima- 
tion of straight antenna - implicit in the identification of 
A with bare £ s nde - is valid for the right end of the regime 
A and for the regime D, while of course other globular 
regimes require going beyond this approximation. 

The main emphasis of the article Rcf. [13] is on the 
role of non-uniform sequence of DNA, which may lead 
to either non-specific adsorbtion strength y, or ID dif- 
fusion coefficient D\, or both to be "noisy" functions of 
coordinate on DNA. In their review of the uniform ho- 
mopolymer case, Ref. [13] employ formula equivalent to 
our Eqs. (27) or (29), but instead of the condition like 
(26) or (29) they minimize overall time with respect to A. 
As we pointed out before, this approach is valid within 
the cross-over corridor around the line y ~ v/Lb 2 . In 
general, the idea to apply variational principle is very 
interesting. It can be generalized beyond the above men- 
tioned corridor if one minimizes the overall dissipation, 
which is equivalent to energy minimization in terms of 
electrostatic analogy, as we show in appendix B. Of 
course, minimization of dissipation is equivalent to the 
diffusion equation as long as diffusion is linear. Alterna- 
tively, one can also think, as emphasized in the work Ref. 
[12], that search mechanism was subject to optimization 
by biological evolution. To employ this idea, it is ob- 
viously necessary first to understand the possible search 
scenario, or regimes, existing in physics, and then, on the 
next stage, one could attempt optimization with respect 
to the parameters, such as DNA packing properties etc, 
which could be subject to selective pressure in evolution. 

BWH [7] and some subsequent authors treated DNA 
solution in terms of domains. Although this term was 
never particularly clearly defined, it could be understood 
as space regions more or less occupied by separate DNA 
coils in solution. With such understanding, the terminol- 
ogy of domains can be used as long as DNA coil fits into 
the volume v, or, in other words, better suitable for an in 
vitro experiment, DNA solution is dilute, such that DNA 
coils do not overlap. The terminology of DNA domains 
becomes unsatisfactory at larger DNA concentrations. 

Work Ref. [10] considered the stochastic approach, 
which means they did not look at the stationary diffu- 
sion, but rather at the trajectory of a single protein. As 
we pointed out before, these approaches must be equiva- 
lent as long as one is only interested in the average time 



14 



of the arrival of the first of proteins. The important con- 
tribution of the work Rcf. [10] was the elucidation of 
the crucial neglect of the correlations between the des- 
orbtion point of a protein and its re-adsorbtion point. It 
is because of this crucial and not always justified approx- 
imation previous theories appear to have overlooked the 
mechanism of correlated re-adsorbtion, which is entirely 
due to the DNA being a polymer and a fractal coil. Cor- 
related re-adsorbtion was anticipated in the experimental 
works [6]. 



C. Experimental situation 

Most of the experiments in the field (see review [6] 
and references therein) involve various ingenious arrange- 
ments of two or more target sites on the linear or ring 
DNA and observation of the resulting enzyme processiv- 
ity. In the light of our theory it would be interesting 
to revive the earlier BWH-style experiments and to look 
carefully at the theoretically predicted multiple features 
of J(y) curves, such as asymmetric maximum, various 
scaling regions, the possible deceleration, etc. 

The seeming difficulty is that all our "interesting" 
regimes start when y > p 2 /b 2 d, when antenna is longer 
than DNA persistence length. Since persistence length 
of dsDNA, p, is fairly large, about 150 base pairs under 
usual ionic conditions (say, [Na] = 0.2 M), and assum- 
ing b is about the diameter of the double helix, we get 
p/b w 25 for the dsDNA. Unless d is large, this seems 
to require fairly large non-specific adsorbtion energies, 
about 6ksT to 10/cbT, which is a lot but not impossi- 
ble. In any case, we would like to emphasize that the 
maximum J{y) has been observed [8], which, according 
to our theory, could have happened only at y > p 2 /b 2 d, 
thus assuring that this range is within reach. 

One of the most critical and poorly known parameters 
of our theory is d = Di/D 3 . Of course, D 3 , diffusion co- 
efficient of the protein in water, is known pretty well, and 
can be simply estimated based on its size using Stokes- 
Einstein relation. The difficult part is about Di, which 
involves friction of the protein against DNA in the sol- 
vent. It is clear that slow diffusion along DNA would 
make the entire mechanism of ID sliding less efficient, 
and indeed decreasing d systematically reduces the rate 
that we obtain in almost all regimes. There are only two 
exceptions to this: one is trivial, it is pure Smoluchowski 
process not involving any sliding and realized only when 
there is no non-specific adsorbtion on DNA (y < 1); an- 
other exception is in the regimes E and F - regimes when 
entire DNA, rod-like or coil-like, serves as an antenna, 
which means 3D transport to the DNA is the slowest 
part, the bottleneck of the whole process, so that reduc- 
ing d does not do any damage - except, of course, pushing 
away the corresponding regime boundaries. 

Experimental data on the ID diffusion of proteins 
along DNA are scarce and not completely clear [17]. 

An interesting spin on the whole issue of ID transport 



is added by the proteins, such as, e.g., helicase, which, 
provided with proper energy supply, can move actively. 
For us, in the context of our present theory, active move- 
ment is likely to correspond to great increase of D\, or 
d, for cither actively moving proteins themselves, or for 
passively diffusing proteins which might receive push or 
pull from active ones. At the first glance, this sounds 
like a paradoxical statement, because active motion is 
not diffusion in the sense that displacement is linear in 
time. However, this is only true up to a certain time 
and length scales. At larger scale, we can reasonably 
assume that it would be diffusion again, albeit with a 
vastly increased diffusion coefficient. Indeed, first, there 
is always a probability of thermally activated detachment 
from DNA, and, second, given that two strands in DNA 
are antiparallel, the re-adsorbtion is likely to lead to ran- 
dom choice of direction of further sliding. These two in- 
gredients surely correspond to diffusion, in the sense that 
displacement goes like i 1 / 2 . Of course, this entire issue 
of active transport requires further investigation, which 
naturally brings us to the conclusion of this paper. 



VI. CONCLUSION 

Many questions remain open. The role of concurrent 
protein species, the role of non-uniform DNA sequence, 
the role of DNA motion [18], the probability of unusually 
long search times, the search on a single stranded DNA or 
RNA, the role of superhelical structures, the dependence 
of rate (or search time) on the specific positions of one 
or more targets on DNA, the related issue of enzyme 
processivity, the role of excluded volume for very long 
DNA and corresponding loop-erasing walks [19] - all of 
these questions invite theoretical work. 

To conclude, we have analyzed all scaling regimes of 
the diffusion-controlled search by proteins of the specific 
target site located on DNA. We found many regimes. The 
major idea can be formulated in terms of the cross-over 
between ID sliding along DNA up to a certain length 
scale and 3D diffusion in surrounding space on the larger 
length scale. Overall, qualitatively, this idea seems to be 
in agreement with the intuition expressed in experimen- 
tal papers. In addition, we have made several theoretical 
predictions which are verifiable and (even more impor- 
tantly) falsifiable by the experiments. We are looking 
forward to such experiments. 
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APPENDIX A: SIMPLE SCALING DERIVATION 
OF THE SMOLUCHOWSKI RATE AND THE 
SMOLUCHOWSKI TIME 

Classical Smoluchowski theory [1] treats the diffusion- 
controlled process of irreversible absorbtion of diffusing 
particles by an immobile sphere of a given radius, call 
it b. As in our proteins problem, Smoluchowski theory 
can be formulated either in terms of stationary rate J s , 
assuming concentration c is fixed, or in terms of mean 
first passage time t s for a single protein. 

Let us imagine that a protein diffuses within a volume 
v, and its diffusion coefficient is D 3 . Let us further define 
the time interval tb such that over time tb protein moves 
the distance of order b: D 3 tb ~ b 2 . Then, over a longer 
time t protein visits t/tb spots of the size b each, and, 
given that b 3 <C v, the probability that none of this spots 
is the target, or the probability to keep missing target for 
the time t obeys Poisson distribution and decays expo- 
nentially with t: (l - b 3 /v) t/tb ~ exp [-tb 3 /(vt b )]. The 
mean first passage time is read out of this formula, it is 
t s ~ v/(D 3 b). 

The corresponding stationary rate is obtained by in- 
verting this time, assuming overall concentration of pro- 
teins c— 1/v. Thus, J s ~ D 3 cb. 

Of course, more accurate derivation, available in a 
number of textbooks (and easily formulated in terms of 
electrostatic analogy, see section B), is necessary to com- 
plement the result with the correct prefactor of An. 

APPENDIX B: ELECTROSTATIC ANALOGY 

Here, we re-derive the results of the section II using 
the fact that stationary diffusion equation is the same 
as Laplace equation in electrostatics. Specifically, the 
problem of diffusion into the target of the size b is equiv- 
alent to the problem of finding the electric field around a 
charge of the size b. The key relatively non-trivial point 
of this analogy is to realize that the potential well for 
diffusing particles is equivalent in electrostatic language 
to the region in space with very high dielectric constant. 
In our case the potential well is located all around DNA, 
and the target is also somewhere on the DNA. There- 
fore, it is equivalent to the electrostatic problem in which 
we have a channel, of the diameter about b, filled with 
high dielectric constant material, for instance - water, 
and surrounded by a low dielectric constant material. 



Specifically, it is easy to check that y of the diffusion 
problem is exactly equivalent to e w /e m - the ratio of 
dielectric constants of water and surrounding medium: 
]J = e w /e m > 1. 

Thus, we have to address the problem of a charge Q 
located inside the water filled channel in, let say, a thick 
lipid membrane. For the straight channel, this is a well 
known problem in membrane biophysics. It was first 
studied by Parsegian [20], and the recent most detailed 
exposition is given in the article [21]. Here, we give only 
simple scaling consideration. 

Since e w /e m 1, field lines prefer to remain inside the 
channel for as long as possible. This gives the picture of 
electric field equivalent to the Fig. 1, a or b. In other 
words, we should say that there is some length scale A 
along the channel, and within this scale electric field lines 
are predominantly confined in the channel. At the same 
time, outside of the sphere of radius £, electric field is 
close to that of a spherical charge in unrestricted space. 
Thus, electric field energy can be approximated as the 
sum of two parts, one due to the uniform field in the 
volume about b 2 X in the channel, and the other around 
the ^-sphere in the medium. Since Enfield in the channel 
is about Q/b 2 e w while £)-field is Q/b 2 , the part of energy 
due to the field inside the channel is about (Q/b 2 e w ) x 
(Q/b 2 ) x (b 2 \) = Q 2 X/b 2 e w . At the same time, energy of 
the field in the outer zone is about Q 2 /£,e m . Thus, total 
electrostatic energy (self-energy of the charge Q) is 

To begin with, let us assume that the channel is 
straight. Then, A = £, and minimization of the energy 
(Bl) gives A <~ b\Je w /e m ~> b. This formula can be found 
in the book ref. [22]. Given that y — e w /e m , this formula 
is equivalent to our result for the antenna length in the 
straight antenna regime A (assuming d = 1). 

Consider now coiled channel; such problem was never 
considered in electrostatic context, but one can imag- 
ine, for instance, a flexible fiber of high dielectric con- 
stant material surrounded by air. Formula (Bl) still 
applies, but £ ~ VAp- Minimization then yields A ~ 
b 4/3 p -i/3 ( €w / em ) 2 / 3 = 54/3^-1/3^2/3^ which ig our rc _ 

suit for the antenna length in the regime B. 

To conclude, we note that minimization of energy in 
the electrostatic language is translated to minimization 
of dissipation in the diffusion language. 
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